Fix embedding edge cases: dynamic batch sizing and token limit fallback#7
Fix embedding edge cases: dynamic batch sizing and token limit fallback#7drewburchfield merged 2 commits intomasterfrom
Conversation
- Replace hardcoded batch_size=60 with dynamic calculation based on chunk density (3 chars/token estimate, 28k token budget) - Add fallback: if whole-note embed fails on token limit, retry with chunking - Don't retry deterministic token limit errors in _call_api_with_retry - Extract shared _is_token_limit_error() helper for consistent detection - Update stale docs: TOOLS.md limitations, vector_store.py comment Closes NAS-989, NAS-993
If a chunk batch exceeds the token limit, halve the batch size and retry the same position. Keeps halving until it fits or batch_size reaches 1. This handles dense content where the 3 chars/token estimate is still too generous.
|
|
||
| logger.debug(f"Embedded chunks {i + 1}-{i + len(chunk_batch)} of {len(chunks)}") | ||
| except EmbeddingError as e: | ||
| if _is_token_limit_error(e) and batch_size > 1: |
There was a problem hiding this comment.
🟡 Batch retry checks batch_size > 1 instead of len(chunk_batch) > 1, causing wasteful identical API calls
When the remaining chunks at position i are fewer than batch_size, reducing batch_size does not change the actual chunk_batch sent to the API. The loop at src/embedder.py:229 checks batch_size > 1 to decide whether to retry with a smaller batch, but the actual batch sent is chunks[i : i + batch_size] (src/embedder.py:207), which is bounded by the remaining chunks. This means the code can make multiple identical failing API calls (each with a rate-limit sleep at src/embedder.py:210) before batch_size shrinks below the actual remaining chunk count.
Example scenario: 5 remaining chunks with batch_size=42
batch_size=42→chunk_batch= 5 chunks → API fails (token limit)batch_size=21→chunk_batch= 5 chunks → same request, fails againbatch_size=10→chunk_batch= 5 chunks → same request, fails againbatch_size=5→chunk_batch= 5 chunks → same request, fails againbatch_size=2→chunk_batch= 2 chunks → different request, may succeed
That's 4 wasted API calls with rate-limit delays.
| if _is_token_limit_error(e) and batch_size > 1: | |
| if _is_token_limit_error(e) and len(chunk_batch) > 1: |
Was this helpful? React with 👍 or 👎 to provide feedback.
Closes NAS-989
Closes NAS-993
Summary
batch_size=60with dynamic calculation based on chunk density (3 chars/token, 28k token budget)_call_api_with_retry_is_token_limit_error()helper for consistent detection across call sitesRoot cause
Two edge cases in
embed_with_chunks():contextualized_embedpushes against Voyage's 32k token limit when content is denseTest plan